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Abstract 

Although  nonparametric  regression  has  traditionally  focused  on  the  estimation  of 
conditional  mean  functions,  nonparametric  estimation  of  conditional  quantile  functions  is 
often  of  substantial  practical  interest.  We  explore  a  class  of  quantile  smoothing  splines, 
which  are  defined  as  solutions  to  a  penalized  quantile  regression  problem.  We  character- 
ize solutions,  as  splines,  i.e.  piecewise  polynomials,  and  discuss  computation  by  stan- 
dard linear  programming  techniques.  For  sufficiently  small  values  of  the  bandwidth 
parameter  the  solutions  interpolate  the  specified  quantiles  of  the  response  variable  at  the 
distinct  design  points,  while  for  sufficiently  large  bandwidths  solutions  specialize  to  the 
linear  regression  quantile  fit  (Koenker  and  Bassett(1978))  to  the  observations.  Because 
the  methods  estimate  conditional  quantile  functions  they  possess  an  inherent  robustness 
to  extreme  observations  in  the  response  variable.  Remarkably,  the  entire  path  of  solu- 
tions, in  the  quantile  parameter  or  the  bandwidth  parameter,  may  be  computed  efficiently 
by  parametric  linear  programming  methods.  Finally  we  note  that  the  approach  may  be 
easily  adapted  to  impose  monotonicity,  convexity,  or  other  constraints  on  the  fitted  func- 
tion. Two  examples  are  provided  to  illustrate  the  use  of  the  proposed  methods. 


KEYWORDS:   Nonparametric  regression,  quantiles,  splines,  change-points,  smoothing, 
bandwidth  selection. 


1.  INTRODUCTION 

Several  authors  have  recently  proposed  methods  for  nonparametric  estimation  of  condi- 
tional quantile  functions:  Truong  (1989)  following  the  pioneering  work  of  Stone  (1977)  on 
nearest  neighbor  methods,  Chaudhuri(1991),  Samanta  (1989)  and  Antoch  and  Janssen  (1989) 
using  kernel  methods,  and  White  (1991)  employing  neural  networks.  Hendricks  and  Koenker 
(1992)  discuss  regression  spline  models  and  apply  them  to  electricity  demand  data.  Cox  and 
Jones  in  the  discussion  of  Cole(1988),  reviving  a  suggestion  of  Bloomfield  and  Steiger  (1983), 
have  recently  proposed  estimating  quantile  smoothing  splines  which  minimize 

£px(yi-*te))  +  xjcgww)2<fe 

where  px(u)  =  u(z-  I(u  <  0))  is  the  check  function  of  Koenker  and  Bassett  (1978).  Here  the 
parameter  le  [0,  1]  controls  the  quantile  of  interest,  while  X  e  R+  controls  the  smoothness  of 
the  resulting  cubic  spline,  thus  generalizing  the  extensive  literature  on  classical  least-squares 
smoothing  splines  pioneered  by  Wahba(1990).  This  is  an  intriguing  idea,  and  has  also  been  men- 
tioned, for  example,  in  Cox(1983),  Eubank  (1988)  and  Utreras  (1981)  in  the  median 
pi/2(w)  =  ^  \u  I  case-  However,  the  resulting  quadratic  program  poses  some  serious  computa- 
tional obstacles.  A  recent  paper  by  Bosch,  Ye  and  Woodworth(1993)  discusses  an  interior  point 
algorithm  for  this  problem.  Obviously  the  computational  virtues  of  the  piecewise  linear  form  of 
the  first  term  of  the  objective  function  are  sacrificed  by  the  quadratic  fonn  of  the  smoothness 
penalty. 

One  is  thus  naturally  led  to  ask:  "Why  not  replace  (g"(x))~  in  the  penalty  by  \g"(x)\V 
The  median  special  case  of  this  problem  has  been  studied  in  a  remarkable  paper  by  Schuette 
(1978)  in  the  actuarial  literature.  We  will  show,  expanding  on  Schuette's  discrete  version  of  the 
problem  using  finite  differences,  that  minimizing  (1.1)  retains  the  linear  programming  form  of 
the  parametric  version  of  the  quantile  regression  problem  and  yields  solutions  which  are  easy  to 
compute.  Solutions  with  this  L  \  fonn  of  the  roughness  penalty  are  linear  splines  and  therefore 
provide  a  natural,  automatic  approach  to  estimating  certain  piece-wise  linear  change-point 
models.  An  application  of  this  sort  to  the  relationship  between  maximal  running  speed  and  body 
mass  of  terrestrial  mammals  is  provided  in  Section  3. 


2.  QUANTILE  SMOOTHING  SPLINES 
2.1.  The  Li  Roughness  Penalty 

.    In  prior  work  (Koenker,  Ng,  and  Portnoy(1992)),  we  have  considered  the  problem  of 
minimizing 

^xb]  =  £pt(yi-*Cfi))  +  A.f1I^W|£fc  (i.i) 

/=i 

with  0=yq<^i<  •  •  •  <xn<xn+\=l,  over  the  (Sobolev)  space  W\  of  continuous  functions  on  [0,  1] 


with  absolutely  continuous  first  derivative  and  absolutely  integrable  second  derivative.  However, 
the  argument  given  there  that  the  solution  to  (1.1)  is  a  parabolic  spline,  i.e.  piecewise  quadratic, 
is  incorrect.  Indeed  the  problem  expressed  in  (1.1)  is  ill-posed  since  the  infimum  of  Rx>\  is  not 
attained  by  an  element  of  W\ .  The  situation  can  be  rectified  by  reformulating  the  problem  some- 
what adopting  the  approach  of  Fisher  and  Jerome  (1975)  and  Pinkus  (1988)  who  consider  closely 
related  problems  of  optimal  interpolation. 

We  should  begin  by  briefly  reviewing  some  results  on  optimal  interpolation.  For  an  integer 
k>2,  andp  e  [1,  «),  let 

l 

M\P  =  (j\f(x)\p)Vp 


and  Wp  denote  the  Sobolev  space  of  real  functions  on  [0,  1]  with  k  -  1  absolutely  continuous 
derivatives  and  kth  derivative  existing  almost  everywhere  as  a  function  in  Lp[0,  1].  We  wish  to 
consider  the  problem  of  finding  the  smoothest  interpolant  of  the  points  {(*,-,  y,),  i  ==  1,  •  •  • ,  n) 
in  the  sense  of  solving 

mn\\gik%:geWkp,g(Xi)=yhi=l,  •••,*}  (1.2) 

The  case  p  =  2  is  best  known,  yielding  splines  of  degree  2k-  1  with  knots  at  the  points 
{*,-,/  =  1,  •  •  •  ,  n).  We  are  primarily  interested  in  the  case  of p  =  1  andp  =  «>,  which  have  been 
treated  by  deBoor  (1976),  Fisher  and  Jerome  (1975)  and  Pinkus  (1988). 

Forp  =  1,  apparently  Fisher  and  Jerome  (1975)  were  the  first  to  observe  that  (1.2)  has  no 
solution  for  g  €  W \.  They  showed  that  if  W\  is  expanded  to  include  functions  whose  kth 
derivatives  are  measures,  the  expanded  problem  does  have  a  solution  s,  as  a  spline  of  degree 
k  —  1,  that  the  total  variation  of  its  (k  -  l)th  derivative,  VC?^-1^),  coincides  with  the  extremal 
value  of  (1.2),  and  that  the  measure  s^  is  concentrated  on  n,  or  fewer,  points.  Pinkus  (1988) 
has  refined  this  characterization  somewhat  and  has  provided  considerable  further  generalization. 

To  bridge  the  gap  between  the  smoothing  problem  posed  in  (1.1)  and  the  optimal  interpola- 
tion problem  (1.2),  we  may  simply  observe  that  any  solution,  g,  to  the  former  must  also  solve  the 
latter  in  the  sense  that  g  must  interpolate  itself  at  the  observed  { xt }  and  therefore  must  minimize 
the  roughness  penalty,  subject  to  a  given  fidelity  constraint.  Thus  to  determine  the  form  of  the 
solution  to  the  smoothing  problem  it  suffices  to  consider  the  interpolation  problem. 

It  remains  to  consider  the  question  of  knot  selection  for  the  p  =  1  case.  Pinkus  (1988), 
under  somewhat  restrictive  conditions  on  the  y/'s,  notes  that  for  k  =  2,  the  case  of  primary 
interest  here,  the  knots  of  the  optimal  spline  coincide  with  the  observed  JC/.  That  this  is  true  for 
any  configuration  of  v,'s  can  be  argued  as  follows.  Let  /  be  any  interpolator  of  the  points 
{(*;>  yi)'  1  =  1,  -  •  ■ ,  n)  with  an  absolutely  continuous  first  derivative.  Recall,  e.g.  Natanson 
(1974,  p.  259),  that  the  total  variation  of  an  absolutely  continuous  function  is  the  integral  of  the 
absolute  value  of  its  derivative,  thus  we  may  write, 


v(f)=j\r\x)\dx 


Now,  by  the  mean  value  theorem,  let  w,-  e  (xit  jtJ+i)  be  such  that  f{u{)  -  0>;+i  -yi)/(xi+i  -x{) 
fori  =  1,  •  •  •  ,  aj-1.  Then, 

v  (f)  *  Z;  I  J^Vm*  I  *  2  I fim  + 1 )  -  /*(«*)  I  =  ^  (h 

where  /is  the  piecewise  linear  interpolator  with  knots  at  the  x%.  Finally,  note  for  any  continuous 
piecewise  linear  g  there  exists  a  sequence  of  functions  [gn]  with  absolutely  continuous  first 
derivative  such  that 

lim  V(gn')  =  V(g') 

and  thus  by  the  foregoing  argument /minimizes  V(g)  for  all  such  g. 

Thus,  following  Pinkus  (1988)  if  we  expand  our  original  space  slightly  from  the  Sobolev 
space 

1 


W 


l  =  {g:g(x)=a0  +  alx  +  jQ(x-y)+h(y)dyy  h  e  Lu  a{  e  R,  i  =0,1} 


to 


U2  =  {g:g(x)  =  a0+alx+^(x-y)+dii(y),  Vrfti)<oo1  fl/eR,f  =  0,l} 

and  replace  the  L  i  penalty  on  g"  with  a  total  variation  of  penalty  on  g'  we  obtain, 

Theorem  1.    The  function  g  e  U~  minimizing  £Pt(v/  -gQr,))  +  XV (g')  is  a  linear  spline  with 
knots  at  the  points*,,  i  =  1,  ••-,«. 

Having  established  the  form  of  the  solution,  it  is  straightforward  to  develop  an  algorithm  to 
compute  g.  We  may  write 

g(x)  =  a.i  +  f>j(x-Xi)     xe[Xi,xi+i)    i  =  0,  —  ,  /i 

and  by  the  continuity  of  g 

g(Xi-)  =  g(Xi+)     i  =  l,  ••-,". 

Thus  writing  /j(  =x/+1  -x,-,  we  have  (3,  =  (a/+1  -  a,- )//?;.  The  penalty  may  thus  be  expressed  as 

V(g*)  =  "£  I  P/+i  -  ft  1=  X  I  (a/+2-a(+1)//2/+1  -  (a/+1  -  a,)//*,  | 
i=i  i=i 

Thus  parameterizing  g  by  the  ^-vector  a  =  (g  (*;)),  we  may  write  the  original  problem  as  a  linear 


program, 

n  n-\ 

min  2 Px(y/  -  a/)  +  A.  J  I  d/a  I 

where  df/  =  (0,  •  •  • ,  0,  h]1 ,  -(/ij+i  +AJ1).  ^J+i'  0»  "  * »  0),  7  =  1  ,,*'*,«—  1 .  In  the  impor- 
tant special  (median)  case  of  x  =  1/2,  we  can  view  this  as  simple  data  augmentation  and  therefore 
as  an  ordinary  least  absolute  deviation  (LAD)  regression  problem.  For  general  x,  further 
modifications  described  in  detail  in  Koenker  and  Ng  (1993)  and  implemented  for  "S"  (Becker, 
Chambers  and  Wilks  (1988))  are  conceptually  straightforward.  Since  we  have  n  free  parameters, 
a,  and  In  -  1  pseudo-observations,  solutions  must  have  n  zero  pseudo  residuals  by  complemen- 
tary slackness.  And  in  our  case  these  zeros  correspond  to  either  (i)  exact  interpolation  of  obser- 
vations, so  a,  =yi,  or  (ii)  linearity  of  g  at  an  internal  knot,  i.e.,  (3;+i  =  (3/  for  some  index  i. 
Obviously,  the  parameter  X  controls  comparative  "advantage"  of  these  two  alternative  means  of 
reducing  the  objective  function.  When  X  is  sufficiently  large  all  the  P,  will  be  equal  and  the 
solution  will  be  the  bivariate  linear  quantile  regression  fit  as  in  Koenker  and  Bassett  (1978). 
When  X  is  sufficiently  small,  all  n  observations  will  be  interpolated  when  the  design  points  are 
unique,  otherwise  the  xth  quantiles  at  each  distinct  design  point  are  interpolated. 


2.2.  Bandwidth  Choice 

As  in  any  smoothing  problem,  choice  of  "bandwidth",  here  represented  by  the  parameter  X, 
is  critical.  For  quantile  smoothing  splines,  the  problem  of  computing  a  family  of  solutions  for 
various  X  is  greatly  eased  by  the  fact  that  the  problem  is  a  parametric  linear  program  in  the 
parameter  X.  An  important  implication  of  this  fact  is  that  we  may  initially  solve  the  much 
smaller  linear  quantile  regression  problem  corresponding  to  X  =  »>  and  gradually  relax  the  rough- 
ness penalty  with  a  sequence  of  simplex  pivots,  thus  avoiding  a  direct  solution  of  a  potentially 
rather  large  problem.  It  might  be  noted  that  //  would  need  to  be  quite  large  by  the  usual  standards 
of  applied  statistics  in  order  that  the  resulting  problem  would  actually  loom  large  by  the  stan- 
dards of  contemporary  linear  programming. 

Each  transition  to  a  new  solution  of  the  parametric  linear  program  in  X  involves  a  single 
simplex  pivot  of  an  extremely  sparse  tableau,  and  hence  solving  for  a  broad  range  of  X  is  quite 
efficient.  The  situation  is  quite  analogous  to  the  problem  of  solving  for  the  entire  family  of 
quantile  regression  solutions  in  the  parameter  x,  described  originally  in  Bassett  and  Koenker 
(1982),  and  described  in  greater  detail  in  Koenker  and  d'Orey(1987). 

An  interesting,  and  important  aspect  of  the  way  that  solutions  depend  upon  the  penalty 
parameter  X  involves  the  number  of  interpolated  points.  In  the  classical  L2  smoothing  spline 
literature  much  has  been  made  of  the  "effective  dimensionality"  or  "degrees  of  freedom"  of  the 
estimated  curves  corresponding  to  various  X.  Such  measures  are  usually  based  on  the  trace  of 
various  quasi-projection  matrices  in  the  least-squares  theory.  See,  for  example,  Hastie  and 
Tibshirani  (1990)  for  a  cogent  discussion.  For  the  quantile  smoothing  spline  the  connection  is 
more  direct  in  the  sense  that  there  is  an  explicit  trade-off  between  the  number  of  interpolated 
points  and  the  number  of  linear  segments.    Since  "reasonable"  smoothing  suggests  that  the 


number  of  interpolated  points  is  small  relative  to  n,  it  is  probably  sensible  to  start  the  parametric 
programming  at  the  linear  quantile  regression  solution  rather  than  at  A.  =  0.  If  the  design  is  in 
"general  position"  so  no  two  observations  share  the  same  design  point,  there  must  be  at  least  2 
and  at  most  n  interpolated  ;y,'s.  Call  this  number p\.  Clearly,  px  is  a  plausible  measure  of  the 
effective  dimension  of  the  fitted  model  with  penalty  parameter  Xy  and  n-p\  +  \,  which 
corresponds  to  the  number  of  linear  segments  in  the  fitted  function,  is  a  plausible  measure  of  the 
degrees  of  freedom  of  the  fit.  Such  decompositions  may  be  used  in  conjunction  with  the  func- 
tion R  [g]  itself  to  implement  data-driven  bandwidth  choice.  The  criterion 

SIC  (pi)  =  log  (n~l  £pT(y/-5(x/))  +  '/2A2"1  px  log*, 
1=1 

which  may  be  interpreted  as  the  Schwarz(1978)  criterion  for  quantile  smoothing  spline  problem 
seems  to  perform  well  in  some  limited  applications.  Machado(1993)  considers  similar  criteria 
for  parametric  quantile  regression  and  more  general  M-estimators  of  regression. 


2.3.  The  Loo  Penalty 

Replacing  the  L  j  roughness  penalty  with  the  LM  penalty,  we  have 

X 

Again  we  may  focus  on  the  corresponding  interpolation  problem  which  has  an  extensive  litera- 
ture. Favard  (1940)  was  apparently  the  first  to  show  that  the  problem  of  minimizing  ||g*  '|U 
over 

{g:g(k)  <~,  g(Xi)=g0(Xi)    i  =  l,  ••-,«} 

for  fixed  function  go,  had  a  solution  which  was  a  polynomial  spline  of  degree  fc,  with  k^  deriva- 
tive zero  outside  [*!,*„]  and  less  than  n  -k  knots  all  simple  inside  {x\,xn).  Later  Karlin 
(1975),  deBoor  (1976)  and  Fisher  and  Jerome  (1975)  clarified  the  critical  role  of  perfect  splines, 
i.e.  splines  of  degree  k,  with  \g^  '  |  constant  as  solutions  to  this  "optimal  interpolation"  prob- 
lem. As  the  discussion  in  Powell  (1981,  Chapters  23-24,  see  especially  Figure  23.1)  makes  evi- 
dent the  perfect  spline  solutions  may  not  be  terribly  appealing.  They  are  obviously  required  to  be 
"uniformly  rough",  an  undesirable  feature  unless  the  target  function  has  this  property.  Further- 
more, the  knot  locations  of  the  optimal  perfect  spline  depend  upon  the  configuration  of  interpo- 
lated observations,  a  fact  which  greatly  complicates  their  computation.  Nevertheless  we  believe 
it  is  still  interesting  and  worthwhile  to  consider  the  LM  penalty  and  to  this  end  we  propose 
minimizing  over  quadratic  splines  with  knots  at  the  observed  x,.  Now 

g(x)  =  a,  +  P/Cx  - xi)  +  Ji(x  - xt)2     x  e  [x;,  xi+l )    i  =  0,  ■  •  • ,  n 

This  formulation  allows  us  to  rewrite  the  penalty  as 

|ls"||~=2max  |y,|. 

i 
Again  we  can  formulate   the  problem  as  a  linear  program,  but  now  the  penalty  appears  in  the 


role  of  linear  inequality  constraints.    As  above  let  h\  =jc/+1  —  jc/,   i 
continuity  constraints  of  the  quadratic  spline  we  have 

jihf  +  frht  +  a,-  =  a/+1 

Ijihi  +  p,-  =  p/+1 


=  1, 


,  n-\.    From  the 


for  i  =  1,  ...,  /i-l,  with  po  =  Pi  and  oco  =  oq .  We  require  that  the  quadratic  spline  be  linear  in  the 
exterior  intervals  [0,  X\)  and  (xn,  1].  Obviously  neither  the  roughness  penalty  nor  the  fidelity 
contribution  are  affected  by  this  requirement,  which  gives  us  Yo  =yn  =  0.  Eliminating  the  p's 
yields 


Jihi+yi+ihi+l  = 


ai+2  ~  a/  +  l 


hi+ 


a,-+i  -  a, 


hi 


i  =  1 ,  •  •  • ,  a  -2 


so  we  have  3(«  +  1)  parameters,  and  2{n  +  1)  linear  constraints.   Writing  0  =  (ft,  ai, 
as  the  (ai  +  1)  vector  of  free  parameters  we  have 


,  a„) 


where  Y=  (Yi,   •  '  •  ,  yn-\)',  K  isa.(n  -  1)  x  (n  -  1)  banded  matrix  with 

k  lf  !  =  1,    fcyt  y  =  /7y-,    A:y;  ;_i  =  /?;_!         7  =  2,     ■'•,/!  -1 

and  B  is  a  banded  (/i  -  1)  x  (/i  +  1)  matrix  with 

bn..ln+l  =h~li      j  =  2,  •  •  •  ,  n-1. 


Now  introducing  the  parameter  a  to  represent  the  bound  on  g",  we  may  write  the  problem  as 

min         {Spt(y/--vI-'e)  +  A.a,    Q6g  [-a, a]-"1 }. 

where  X  =  [0  :  /„].  Thus  we  are  once  again  faced  with  a  linear  program,  a  modified  version  of 
the  Bartels  and  Conn  (1980)  algorithm  for  linear,  inequality-constrained  LAD  problems  has  been 
implemented  for  S. 

The  L«  roughness  penalty  may  be  viewed  as  uniform  prior  on  g".  Each  X  implies  a 
corresponding  upper  bound  on  the  magnitude  ||#"|L-  This  is  quite  different  than  the  L\  case. 
Active  constraints  now  correspond  not  to  consecutive  segments  having  the  same  slope  as  in  the 
L\  case,  but  to  segments  on  which  |Y/ 1  =  o\  i.e.  where  g"  hits  the  permissible  upper  bound.  Of 
course  in  the  limiting  case  as  X  -»  «>  so  a  — »  0,  the  solution  is,  as  with  the  L  {  penalty,  the  linear 
x  regression  quantile  estimate.  In  contrast  to  the  piecewise  linear  form  for  g  with  a  few 
"elbows"  where  g'  jumps  which  characterizes  the  solution  for  the  L{  penalty,  the  L„  penalty 
enforces  a  uniform  bound  on  g"  and  this  straightens  the  elbows  and  introduces  modest  curvature 
over  longer  segments  to  compensate.  The  LM  solution  seems  visually  much  smoother  than  the 
piecewise  linear  L  \  solution.  Which  seems  preferable  is  obviously  application  dependent.  The 


comments  on  bandwidth  selection  above  apply  also  to  the  L«,  case. 
2.4.  Extensions 

Clearly  there  is  considerable  scope  for  other  forms  of  the  roughness  penalty.  We  have 
focused  on  the  L  \  and  LM  penalties  on  g",  but  other  Lp  norms  are  possible  as  are  other  differen- 
tial operators.  Indeed  it  might  be  interesting  to  explore  the  estimation  of  conditional  mean 
models  using  the  L\  and  £«,  penalties,  if  efficient  algorithms  for  the  resulting  quadratic  pro- 
gramming problems  could  be  developed.  The  simple  forms  considered  above  have  the  virtue 
that  their  linear  programming  formulation  makes  efficient  computation  immediately  practical. 

In  many  practical  applications  there  will  often  be  the  question  of  extending  these  methods 
to  multivariate  settings.  The  additive  spline  models  of  Hastie  and  Tibshirani  (1990)  and  others 
naturally  suggest  themselves.  Some  preliminary  plots  for  bivariate  x  look  quite  promising. 
Clearly  the  nonlinear  character  of  the  present  smoothers  vitiate  the  attractive  iterative 
"backfitting"  algorithms  available  in  the  / 2 -case.  But  feasible  estimators  may- still  be  possible 
using  a  limited  number  of  simplex  pivots  from  an  initial  linear  (in  covariates)  quantile  function 
estimate. 

There  are  a  number  of  intriguing  extensions  incorporating  further  constraints.  Monotoni- 
city  and  convexity  of  the  fitted  function  g  may  be  readily  imposed  by  simply  imposing  further 
linear  inequality  constraints  on  the  parameters  of  the  problem.  While  adding  such  inequality 
constraints  to  the  corresponding  1 2  problem  results  in  a  significant  increase  in  complexity, 
adding  linear  inequality  constraints  to  the  quantile  smoothing  spline  problems  does  not  alter  the 
fundamental  nature  of  the  optimization  problem  to  be  solved. 

Rates  of  convergence  for  these  splines  undoubtedly  parallel  classical  results  for  least- 
squares  smoothing  splines.  For  the  case  of  regression  splines  (i.e.,  for  unconstrained  p- 
dimensional  classes  of  B-splines,  with  p  -pn  — >  00),  He  and  Shi  (1992)  show  that  the  Li  con- 
vergence rate  is  n~  .  We  hope  to  report  further  on  the  asymptotic  behavior  of  quantile  smooth- 
ing splines  in  future  work. 

3.  SOME  EXAMPLES  AND  ILLUSTRATIONS 

Our  first  example,  based  on  Chappell  (1989),  explores  the  relationship  between  maximal 
running  speed  and  body  mass  of  terrestrial  mammals.  The  data,  collected  and  described  in  detail 
by  Garland  (1983)  is  plotted  in  Figure  3.1;  107  species  are  represented.  Two  groups  are 
identified  for  special  treatment  by  Chappell:  "hoppers"  which,  like  the  kangaroo,  ambulate  by 
hopping  and  are  labeled  by  the  plotting  character  h  in  the  figure,  and  "specialized",  labeled  S, 
which  like  the  hippopotamus,  the  porcupine,  and  man  "were  judged  unsuitable  for  the  inclusion 
in  analyses  on  account  of  lifestyles  in  which  speed  does  not  figure  as  an  important  factor."  For 
reference  we  have  included  Chappell's  quadratic,  and  single-changepoint,  log  linear  models. 
Both  are  estimated  by  least-squares,  both  omit  the  S  observations  and  fit  an  additive  shift  effect 
for  the  "hoppers". 

In  Figure  3.2  we  illustrate  two  cubic  smoothing  splines  estimated  by  minimizing  the  penal- 
ized least  squares  criterion 


The  solid  line  is  the  tit  when  the  entire  sample  is  included,  the  dotted  line  excludes  the  special 
animals  labeled  S.  In  both  cases  X  is  chosen  by  generalized  cross-validation  as  described,  for 
example,  in  Craven  and  Wahba  (1979).  One  can  immediately  see  the  lack  of  robustness  of  the 
least  squares  splines  to  the  slower  special  animals. 

Next  we  fit  the  entire  family  of  median  smoothing  splines  using  the  L  i  penalty.  There  are 
182  distinct  curves  corresponding  to  A.'s  ranging  from  0  to  <».  In  Figure  3.3  we  plot  3  of  these 
curves  for  X  =  {1.01,  12.23,  41.16}.  The  dimension  of  the  fitted  functions  represented  by  the 
number  of  interpolated  points  is  given  in  the  legend.  Like  the  least  squares  spline  in  the  previous 
figure,  these  estimates  are  based  on  all  the  observations.  However,  unlike  the  least  squares 
splines  which  estimate  the  conditional  mean  function,  these  median  splines  have  an  inherent 
robustness  to  outliers  in  the  vertical  direction.  As  in  parametric  quantile  regression,  points  may 
be  moved  up  or  down  in  the  plot  without  effecting  the  fitted  function  so  long  as  they  do  not  cross 
it.  The  property  follows  immediately  from  the  fact  that  the  subgradient  of  the  objective  function 
depends  upon  the  yt  only  through  the  signs  of  the  residuals  not  their  magnitude.  See  Koenker 
and  Bassett  (1978,  Thm  3.5). 

Using  the  SIC  criterion  to  choose  X  selects  the  solid  line  with  a  single  break.  This  fit  is 
remarkably  similar  to  Chappell's  preferred  single  changepoint  model,  particularly  in  view  of  the 
fact  that  we  have  done  none  of  the  preliminary  data  editing  which  seems  essential  to  the  success 
of  the  least  squares  based  methods.  The  simple  piecewise  linear  form  of  the  L  \  splines  make 
them  a  natural  technique  for  estimating  linear  changepoint  models. 

In  Figure  3.4  we  illustrate  several  distinct  quantile  smoothing  splines  for  the  same  data, 
again  based  on  the  L  \  penalty.  Here  the  upper  quantiles  are  of  particular  interest  since  they 
represent  the  envelope  of  biological  feasibility.  In  this  figure  we  have  again  chosen  X  for  the 
median  and  90th  percentile  by  the  SIC  criterion;  however,  SIC  produces  a  rather  rough  fit  with 
p(X)  =  8  for  the  25th  percentile  and  p(X)  =  7  for  the  75th.  So  we  have  selected  somewhat  larger 
Vs  for  these  curves  to  achieve  a  more  consistent  degree  of  smoothness.  Even  so,  the  viewer  will 
note  that  the  75th  and  90th  percentile  curves  cross  in  Figure  3.4  indicating,  perhaps,  that  the  75th 
may  still  be  somewhat  oversmoothed,  or  simply  that  there  is  not  enough  data  to  distinguish  these 
two  quantiles  for  the  larger  animals. 

In  Figure  3.5  we  illustrate  the  same  four  quantiles  this  time  employing  the  LM  penalty. 
Now  the  solutions  are  quadratic  splines  and  visually  smoother  than  the  linear  splines  produced 
by  the  L  i  penalty.  Nevertheless  the  qualitative  features  of  the  estimated  curves  are  quite  similar. 
Again,  SIC  was  quite  successful  except  at  the  .25  quantile.  In  this  case  we  again  increased  X  to 
achieve  a  more  consistent  degree  of  smoothness. 

Our  second  example  is  the  well-known  motorcycle  data  which  has  been  widely  analysed  in 
the  nonparametric  regression  literature.  The  data,  which  appear  in  Hiirdle  (1990),  are 
accelerometer  readings  taken  through  time  from  an  experiment  on  the  efficacy  of  motorcycle 
crash  helmets.   The  .r-coordinate  is  the  time  in  milliseconds  after  a  simulated  impact,  and  the 


response  variable  y  is  the  acceleration  (in  g)  of  the  head  of  the  test  dummy.  In  Figure  3.6  we 
illustrate  the  classical  smoothing  spline  estimate,  with  GCV  X,  as  well  as  the  L\  median 
smoothing  spline  based  on  SIC  selected  X.  An  interesting  feature  of  the  piecewise  linear  L  i  esti- 
mate is  that  unlike  other  estimates,  it  does  not  suggest  that  the  dummy  anticipates  the  crash, 
accelerating  its  head  prior  to  the  initial  deceleration. 

In  Figure  3.7  and  3.8  we  illustrate  several  quantiles  for  the  motorcycle  data  using  both  the 
L\  and  LM  penalties.  We  might  observe  that  the  piecewise  linear  L  \  estimate  are  very  similar  to 
the  L  i  -penalty  estimates  using  quadratic  splines  which  have  appeared  in  our  previous  work.  For 
reasonable  X  the  quadratic  splines  were  also  essentially  piecewise  linear  with  brief  quadratic  seg- 
ments to  connect  the  linear  stretches.  In  Figure  3.7,  the  three  estimated  quantile  functions  cross 
at  the  penultimate  point.  This  is  apparently  due  to  the  wide  separation  of  the  last  design  point 
from  the  others,  a  fact  that  has  prompted  other  investigators  to  delete  it  from  their  plots. 
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Fig  3.1:  Body  Mass  vs.  Maximal  Recorded  Running  Speed  of  107  Terrestrial  Mammals  with  two  estimated 
models  from  Chappell(1989) 
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Fig  3.2:  Mammal  Data  and  two  Least-Squares  (Cubic)  Smoothing  Splines 
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Fig  3.3:   Mammal  Data  and  three  Median  L\  Smoothing  Splines:   Effective  dimension  of  the  spline  is  p(A) 
indicated  in  the  legend 
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Fig  3.4:  Mammal  Data  and  four  Quantile  L\  Smoothing  Splines 
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Fig  3.5:  Mammal  Data  and  four  Quantile  Loo  Smoothing  Splines 
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Fig  3.6:  Motorcycle  Data  and  Mean  and  Median  Smoothing  Splines 
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Fig  3.7:  Motorcycle  Data  and  four  Quantile  L\  Smoothing  Splines 
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